Towards Automatic Extraction of Argument Structure from Corpora
نویسندگان
چکیده
The valency of predicates is a key component of a lexical entry because most, if not all, recent syntactic theories`project' syntactic structure from such information in the lexicon (e.g. Pollard & Sag, 1987). Therefore, a wide-coverage robust parser utilising a grammar based on one of these theories must have access to an accurate dictionary encoding (at a minimum) valency information and probably further details of argument structure. However, as designers of natural language processing systems have observed (e.g. Jensen, 1991) valency is closely associated to lexical sense and the senses of a word change between corpora, sublanguages and/or subject domains. Jensen et al (1994) take this as evidence that the coupling between syntactic parsing and valency information should be much weaker than in current syntactic theories. From a more theoretical standpoint, Grimshaw (1990), Pustejovsky (1993) and others have argued that valency should instead be projected from lexical semantic information. In a recent experiment with a wide-coverage parsing system utilising a grammatical framework based on standard lexicalist assumptions, Briscoe & Carroll (1993) observed that over half the analysis failures on unseen corpus examples were caused by incorrect subcategorisation for predicate valency. Because of the close connection between sense and valency and between subject domain and sense, it may be that a fully accuratèstatic' valency dictionary of the language is unattainable. The work we describe below could equally support a large-scale attempt to construct such a dictionary from substantial quantities of corpus material, or the less ambitious and more frequent construction of`disposable' dictionaries or augmentation of`self-updating' dictionaries as and when new corpora need to be parsed. We have developed a system which is potentially capable of delivering puta-tive lexical entries for predicates extracted from textual corpora, focussing on the acquisition ofàrgument structure' (deened as valency, semantic selectional restrictions/preferences, diathesis alternations, bounded dependency rules, such as passive or particle movement, and control of understood arguments in pred-icative complements) { though so far we have mostly explored predictions with respect to valency. The approach we have adopted is to construct a `shallow' syntactic but global analysis of sentences for corpus material annotated with part-of-speech and punctuation mark sequences disambiguated by a tagger. We then extract relevant competing subanalyses surrounding a given predicate from all possible shallow analyses of sentences. These so-called patternsets for a given predicate are then evaluated using heuristics and a simple probabilistic approximation of the correctness of a given pattern, so …
منابع مشابه
Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images
As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملThe Effect of Dynamic Assessment of Toulmin Model through Teacher- and Collective-Scaffolding on Argument Structure and Argumentative Writing Achievement of Iranian EFL Learners
Considering the paramount importance of writing logical arguments for college students, this study investigated the effect of dynamic assessment (DA) of Toulmin model through teacher- and collective-scaffolding on argument structure and overall quality of argumentative essays of Iranian EFL university learners. In so doing, 45 male and female Iranian EFL learners taking part in the study were r...
متن کاملUDLex: Towards Cross-language Subcategorization Lexicons
This paper introduces UDLex, a computational framework for the automatic extraction of argument structures for several languages. By exploiting the versatility of the Universal Dependency annotation scheme, our system acquires subcategorization frames directly from a dependency parsed corpus, regardless of the input language. It thus uses a universal set of language-independent rules to detect ...
متن کاملAcquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation
We present a method for acquiring reliable predicate-argument structures from raw corpora for automatic compilation of case frames. Such lexicon compilation requires highly reliable predicate-argument structures to practically contribute to Natural Language Processing (NLP) applications, such as paraphrasing, text entailment, and machine translation. We first apply chunking to raw corpora and t...
متن کاملTBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction
The manual identification of terminology from specialized corpora is a complex task that needs to be addressed by flexible tools, in order to facilitate the construction of multilingual terminologies which are the main resources for computer-assisted translation tools, machine translation or ontologies. The automatic terminology extraction tools developed so far either use a proprietary code or...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995